Mapping Multi-Layer Baysian LDA to Massively Parallel Supercomputers
نویسندگان
چکیده
LDA, short for Latent Dirichlet Allocation, is a hierarchical Bayesian model for content analysis. LDA has seen a wide variety of applications, but it also presents computational challenges because the iterative computation of approximate inference is required. Recently an approach based on Gibbs Sampling and MPI is proposed to address these challenges, while this report presents the work that maps it to a massively parallel supercomputer, Blue Gene. The work enhances the runtime performance by utilizing special hardware architecture of Blue Gene such as dual floating-point unit and by using general programming/compiling techniques such as loop unfolding. Results from the empirical evaluation using a realworld large-scale data set indicate the following findings: First, the use of dual floating-point unit contributes to a significant performance gain, and thus it should be considered in the design of processors for computationally intensive machine learning applications. Second, although it is a simple technique and most compilers support it, loop unfolding improves the performance gain even further. Since loop unfolding is general enough to be applied to other platforms, this report suggests that compilers should perform loop unfolding in a more intelligent manner.
منابع مشابه
Molecular simulation of complex systems using massively parallel supercomputers
Massively parallel supercomputers, such as the 150 Gigaflop Intel Paragons located at Oak Ridge National Laboratory and Sandia National Laboratories, make possible molecular simulation of systems of unprecedented complexity and realism. We describe some of the issues related to efficient implementation of molecular dynamics and Monte Carlo simulations on massively parallel supercomputers. The a...
متن کاملLattice QCD with Commodity Hardware and Software
Large scale QCD Monte Carlo calculations have typically been performed on either commercial supercomputers or specially built massively parallel computers such as Fermilab’s ACPMAPS. Commodity computer systems offer impressive floating point performance-tocost ratios which exceed those of commercial supercomputers. As high performance networking components approach commodity pricing, it becomes...
متن کاملA Visual Analytics System for Optimizing Communications in Massively Parallel Applications
Current and future supercomputers have tens of thousands of compute nodes interconnected with high-dimensional networks and complex network topologies for improved performance. Application developers are required to write scalable parallel programs in order to achieve high throughput on these machines. Application performance is largely determined by efficient inter-process communication. A com...
متن کاملTuning HipGISAXS on Multi and Many Core Supercomputers
With the continual development of multi and manycore architectures, there is a constant need for architecturespecific tuning of application-codes in order to realize high computational performance and energy efficiency, closer to the theoretical peaks of these architectures. In this paper, we present optimization and tuning of HipGISAXS, a parallel X-ray scattering simulation code [1], on vario...
متن کاملMulti - Million Particle Molecular Dynamics onMPPsPeter
We discuss the computational diiculties associated with performing large-scale molecular dynamics simulations involving more than 100 million atoms on modern massively parallel supercomputers. We discuss various performance and memory optimization strategies along with the method we have used to write a highly portable parallel application. Finally, we discuss some recent work addressing the pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011